AITopics | markovian noise

Almost Sure Convergence Rates of Stochastic Approximation and Reinforcement Learning via a Poisson-Moreau Drift

Liu, Xinyu, Xie, Zixuan, Zhang, Shangtong

arXiv.org Machine LearningMay-11-2026

Establishing almost sure convergence rates for stochastic approximation and reinforcement learning under Markovian noise is a fundamental theoretical challenge. We make progress towards this challenge for a class of stochastic approximation algorithms whose expected updates are contractive, a setting that arises in many reinforcement learning algorithms such as $Q$-learning and linear temporal difference learning. Specifically, for a power-law learning rate $O(n^{-η})$ with $η\in (1/2, 1)$, we obtain an almost sure convergence rate arbitrarily close to $o(n^{1 - 2η})$. For a harmonic learning rate $O(n^{-1})$, we obtain an almost sure convergence rate arbitrarily close to $o(n^{-1})$, which we argue is a strong result because it is close to the optimal rate $O(n^{-1}\log\log n)$ given by the law of the iterated logarithm (for a special case of i.i.d. noise). Key to our analysis is a novel Lyapunov drift construction that applies a Poisson-equation based correction for Markovian noise to the well-established Moreau-envelope smoothing for the contractive mapping.

approximation, machine learning, reinforcement learning, (13 more...)

arXiv.org Machine Learning

2605.07104

Country: North America (0.28)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

High-Probability Bounds for SGD under the Polyak-Lojasiewicz Condition with Markovian Noise

Kar, Avik, Chandak, Siddharth, Singh, Rahul, Moulines, Eric, Bhatnagar, Shalabh, Bambos, Nicholas

arXiv.org Machine LearningMar-17-2026

We present the first uniform-in-time high-probability bound for SGD under the PL condition, where the gradient noise contains both Markovian and martingale difference components. This significantly broadens the scope of finite-time guarantees, as the PL condition arises in many machine learning and deep learning models while Markovian noise naturally arises in decentralized optimization and online system identification problems. We further allow the magnitude of noise to grow with the function value, enabling the analysis of many practical sampling strategies. In addition to the high-probability guarantee, we establish a matching $1/k$ decay rate for the expected suboptimality. Our proof technique relies on the Poisson equation to handle the Markovian noise and a probabilistic induction argument to address the lack of almost-sure bounds on the objective. Finally, we demonstrate the applicability of our framework by analyzing three practical optimization problems: token-based decentralized linear regression, supervised learning with subsampling for privacy amplification, and online system identification.

artificial intelligence, assumption 2, machine learning, (19 more...)

arXiv.org Machine Learning

2603.14514

Country:

North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

Add feedback

8c3e38ce55a0fa44bc325bc6fdb7f4e5-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-15-2026, 18:55:38 GMT

log 2, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

Europe > Russia (0.14)
Asia > Russia (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(3 more...)

Genre: Research Report > New Finding (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

8c3e38ce55a0fa44bc325bc6fdb7f4e5-Paper-Conference.pdf

Neural Information Processing SystemsFeb-15-2026, 18:55:35 GMT

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Neural Information Processing Systems

Country:

Europe > Russia (0.14)
Asia > Russia (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.97)

Add feedback

1b318124e37af6d74a03501474f44ea1-Paper.pdf

Neural Information Processing SystemsFeb-12-2026, 08:47:00 GMT

algorithm, gen-oja, generalized eigenvector problem, (9 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Alameda County > Berkeley (0.05)
Asia > Middle East > Jordan (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Add feedback

cc9b3c69b56df284846bf2432f1cba90-Supplemental.pdf

Neural Information Processing SystemsFeb-10-2026, 09:58:56 GMT

actor-critic algorithm, algorithm, inequality, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Asia > Middle East > Jordan (0.04)
North America > Canada (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

First Order Methods with Markovian Noise: from Acceleration to Variational Inequalities

Neural Information Processing SystemsDec-26-2025, 07:42:39 GMT

This paper delves into stochastic optimization problems that involve Markovian noise. We present a unified approach for the theoretical analysis of first-order gradient methods for stochastic optimization and variational inequalities. Our approach covers scenarios for both non-convex and strongly convex minimization problems. To achieve an optimal (linear) dependence on the mixing time of the underlying noise sequence, we use the randomized batching scheme, which is based on the multilevel Monte Carlo method. Moreover, our technique allows us to eliminate the limiting assumptions of previous research on Markov noise, such as the need for a bounded domain and uniformly bounded stochastic gradients. Our extension to variational inequalities under Markovian noise is original. Additionally, we provide lower bounds that match the oracle complexity of our method in the case of strongly convex optimization problems.

markovian noise, name change, order method, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.85)

Add feedback

Gen-Oja: A Simple and Efficient Algorithm for Streaming Generalized Eigenvector Computation

Neural Information Processing SystemsNov-20-2025, 14:48:37 GMT

In this paper, we study the problems of principal Generalized Eigenvector computation and Canonical Correlation Analysis in the stochastic setting. We propose a simple and efficient algorithm, Gen-Oja, for these problems.

algorithm, artificial intelligence, machine learning, (11 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Alameda County > Berkeley (0.05)
Asia > Middle East > Jordan (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Add feedback

A Minimal-Assumption Analysis of Q-Learning with Time-Varying Policies

Nanda, Phalguni, Chen, Zaiwei

arXiv.org Machine LearningOct-21-2025

In this work, we present the first finite-time analysis of the Q-learning algorithm under time-varying learning policies (i.e., on-policy sampling) with minimal assumptions -- specifically, assuming only the existence of a policy that induces an irreducible Markov chain over the state space. We establish a last-iterate convergence rate for $\mathbb{E}[\|Q_k - Q^*\|_\infty^2]$, implying a sample complexity of order $O(1/ε^2)$ for achieving $\mathbb{E}[\|Q_k - Q^*\|_\infty] \le ε$, matching that of off-policy Q-learning but with a worse dependence on exploration-related parameters. We also derive an explicit rate for $\mathbb{E}[\|Q^{π_k} - Q^*\|_\infty^2]$, where $π_k$ is the learning policy at iteration $k$. These results reveal that on-policy Q-learning exhibits weaker exploration than its off-policy counterpart but enjoys an exploitation advantage, as its policy converges to an optimal one rather than remaining fixed. Numerical simulations corroborate our theory. Technically, the combination of time-varying learning policies (which induce rapidly time-inhomogeneous Markovian noise) and the minimal assumption on exploration presents significant analytical challenges. To address these challenges, we employ a refined approach that leverages the Poisson equation to decompose the Markovian noise corresponding to the lazy transition matrix into a martingale-difference term and residual terms. To control the residual terms under time inhomogeneity, we perform a sensitivity analysis of the Poisson equation solution with respect to both the Q-function estimate and the learning policy. These tools may further facilitate the analysis of general reinforcement learning algorithms with rapidly time-varying learning policies -- such as single-timescale actor--critic methods and learning-in-games algorithms -- and are of independent interest.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

2510.16132

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)

Genre: Research Report (0.81)

Industry: Leisure & Entertainment > Games (0.46)

Technology: